Skip to content

feat(optimizer): [0/N] Optimizer Data Model#527

Open
mkuchenbecker wants to merge 3 commits intolinkedin:mainfrom
mkuchenbecker:mkuchenb/optimizer-0
Open

feat(optimizer): [0/N] Optimizer Data Model#527
mkuchenbecker wants to merge 3 commits intolinkedin:mainfrom
mkuchenbecker:mkuchenb/optimizer-0

Conversation

@mkuchenbecker
Copy link
Copy Markdown
Collaborator

@mkuchenbecker mkuchenbecker commented Apr 3, 2026

Optimizer Stack

PR Content
#527 (this) Data Model
#530 Database Repos
#531 REST service
#533 Analyzer app
#534 Scheduler app
#tbd Spark BatchedOFD app
#tbd Infra, docker-compose, smoke test

Summary

PR 0 of N in the optimizer stack.
Overall Project
Service Design doc.

Introduces the optimizer service module mysql data model.

image

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

This PR contains only the data model (entities, DTOs, converters). Repository tests follow in PR 1. Verified:

  • ./gradlew :services:optimizer:compileJava passes
  • ./gradlew compileJava (full project) passes with no regressions
  • Spotless formatting passes

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

Introduces the optimizer service module with:
- MySQL/H2 schema for table_operations, table_stats, table_stats_history,
  and table_operations_history
- JPA entities with JSON column support (vladmihalcea hibernate-types)
- All model/DTO/enum types: OperationType, OperationStatus, TableStats,
  CompleteOperationRequest, JobResult, OperationMetrics, etc.
- JPA AttributeConverters for JobResult and OperationMetrics JSON columns
- MapStruct mapper (OptimizerMapper) for entity→DTO conversion
- Spring Boot application shell and build wiring (settings.gradle,
  build.gradle dockerPrereqs)

No repositories, controllers, or service layer yet — those follow in
subsequent PRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove OperationMetrics class and converter; stats are read
  directly from table_stats instead of duplicating into operations
- Remove orphanFilesDeleted/orphanBytesDeleted from history entity,
  DTO, and schema; operation-specific data belongs in the result JSON
- Add addedSizeBytes to CommitDelta for tracking write volume
- Fix OperationType javadoc to describe current state, not roadmap
- Fix TableOperationsHistoryRow javadoc: written on operation
  complete, not by Spark app directly
- Add field comments to all DTOs and request objects

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): add data model — schema, entities, DTOs, converters feat(optimizer): [1/N] data model Apr 6, 2026
These fields never belonged in the data model — remove them at the
source rather than adding then deleting in a later PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): [1/N] data model feat(optimizer): [1/N] Optimizer Data Model Apr 6, 2026
@mkuchenbecker mkuchenbecker marked this pull request as ready for review April 6, 2026 19:46
@mkuchenbecker mkuchenbecker changed the title feat(optimizer): [1/N] Optimizer Data Model feat(optimizer): [0/N] Optimizer Data Model Apr 6, 2026

/** Terminal states for a completed Spark maintenance job. */
public enum OperationHistoryStatus {
SUCCESS,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have keep the existing status such as canceled, queued etc. These are valid status as some times jobs could not be submitted due to GGW/Yarn issue etc.

private String jobId;

/** Reserved for future per-operation metadata; currently unused. */
private String metrics;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a class instead to capture more info? Or do we plan to capture json string here?

/** Same UUID as the originating {@code table_operations.id}. Set by the caller; not generated. */
@Id
@Column(name = "id", nullable = false, length = 36)
private String id;
Copy link
Copy Markdown
Member

@abhisheknath2011 abhisheknath2011 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this UUID and generated as part of job submission?

private String tableUuid;

@Column(name = "database_name", nullable = false, length = 255)
private String databaseName;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be 128 char long in the current prod schema.

@Column(name = "database_name", nullable = false, length = 255)
private String databaseName;

@Column(name = "table_name", nullable = false, length = 255)
Copy link
Copy Markdown
Member

@abhisheknath2011 abhisheknath2011 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table name is also 128 char long. But yeah we can double check.

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "id", nullable = false)
private Long id;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this auto increment id or primary key?

@Column(name = "table_uuid", nullable = false, length = 36)
private String tableUuid;

@Column(name = "database_id", nullable = false, length = 255)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use only database_name for consistency?

-- Optimizer Service Schema
-- Compatible with MySQL (production) and H2 in MySQL mode (tests).
CREATE TABLE IF NOT EXISTS table_operations (
id VARCHAR(36) NOT NULL,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider adding indexes for these tables too?


/** When the operation completed, as recorded by the complete endpoint. */
@Column(name = "submitted_at", nullable = false)
private Instant submittedAt;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHould this be completionTime instead?

@Builder(toBuilder = true)
@NoArgsConstructor
@AllArgsConstructor
public static class CommitDelta {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also require @JsonIgnoreProperties ? could provide forward compatibility or safeguard during upgrades in case of new fields addition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants